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1 Session 8B: Real time scheduling and performance analysis: Ener gy efficient real-time jjjj 
scheduling 

Amit Sinha, Anantha P. Chandrakasan 

November 2001 Proceedings of the 2001 IEEE/ACM international conference on 
Computer-aided design 

Full text available* US odf(163 68 KB) Additional Information: full citation , abstract , references , citings, index 

terms 

Real-time scheduling on processors that support dynamic voltage and frequency scaling is 
analyzed. The Slacked Earliest Deadling First (SEDF) algorithm is proposed and it is shown 
that the algorithm is optimal in minimizing processor energy consumption and maximum 
lateness. An upper bound on the processor energy savings is also derived. Real-time 
scheduling of periodic tasks is also analyzed and optimal voltage and frequency allocation 
for a given task set is determined that guarantees schedulab ... 

2 Ener g y-aware systems: Energ y-efficient, ut ilit y ac c ru al s cheduling under resource Q 
constraints for mo bile embe dded systems 

Haisang Wu, Binoy Ravindran, E. Douglas Jensen, Peng Li 

September 2004 Proceedings of the fourth ACM international conference on Embedded 
software 

Full text available: ^ pdf(379.20 KB) Additional Information: f ull citation , abstract , references , index terms 

We present an energy-efficient real-time scheduling algorithm called the Resource- 
constrained Energy-Efficient Utility Accrual Algorithm (or ReUA). ReUA considers an 
application model where activities are subject to time/utility function-time constraints, 
resource dependencies including mutual exclusion constraints, and statistical performance 
requirements including probabilistically satisfied, activity (timeliness) utility bounds. 
Further, ReUA targets mobile embedded systems where syste ... 

Keywords: energy-efficient scheduling, real-time systems, time/utility functions, utility 
accrual scheduling 



3 Compile-time dynamic voltage scaling settin g s: opportunities and limits 
Fen Xie, Margaret Martonosi, Sharad Malik 

May 2003 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2003 conference 
on Programming language design and implementation, volume 38 issue 5 
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Full text available: ^g pdf(291.26 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

With power-related concerns becoming dominant aspects of hardware and software design, 
significant research effort has been devoted towards system power minimization/Among 
run-time power-management techniques, dynamic voltage scaling (DVS) has emerged as 
an important approach, with the ability to provide significant power savings. DVS exploits 
the ability to control the power consumption by varying a processor's supply voltage (V) 
and clock frequency (f). DVS controls energy by scheduling diffe ... 

Keywords: analytical model, compiler, dynamic voltage scajing, low power, mixed-integer 
linear programming 



Low-power system design: Task scheduling and voltage selection for energy 
minimization 

Yumin Zhang, Xiaobo Sharon Hu, Danny Z. Chen 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available* fi3 df(185 38 KB) Additional Information: full citation , abstract, references , citings, index 

' terms 

In this paper, we present a two-phase framework that integrates task assignment, ordering 
and voltage selection (VS) together to minimize energy consumption of real-time 
dependent tasks executing on a given number of variable voltage processors. Task 
assignment and ordering in the first phase strive to maximize the opportunities that can be 
exploited for lowering voltage levels during the second phase, i.e., voltage selection. In the 
second phase, we formulate the VS problem as an Integer Progra ... 

Keywords: task scheduling, voltage selection 



5 Intrapro g ram dynamic voltage sc aling : Bounding o p portunities with analytic modeling Q 
Fen Xie, Margaret Martonosi, Sharad Malik 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 3 

Full text available:^ pdf(98Q. 11 KB) Additional Information: full citation , abstract , references , index terms 

Dynamic voltage scaling (DVS) has become an important dynamic power-management 
technique to save energy. DVS tunes the power-performance tradeoff to the needs of the 
application. The goal is to minimize energy consumption while meeting performance needs. 
Since CPU power consumption is strongly dependent on the supply voltage, DVS exploits 
the ability to control the power consumption by varying a processor's supply voltage and 
clock frequency. However, because of the energy and time overhead asso ... 

Keywords: Analytical model, compiler, dynamic voltage scaling, low power, mixed-integer 
linear programming 



6 Procrastination scheduling in fixed priority real-time systems Q 
Ravindra Jejurikar, Rajesh Gupta 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIGPLAN/SIGBED 

conference on Languages, compilers, and tools, volume 39 issue 7 
Full text available- fiBpdfd 15.60 KB) Additional Information: full citation, abstract, references, citings, index 
' ™ terms 

Procrastination scheduling has gained importance for energy efficiency due to the rapid 
increase in the leakage power consumption. Under procrastination scheduling, task 
executions are delayed to extend processor shutdown intervals, thereby reducing the idle 
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energy consumption. We propose algorithms to compute the maximum procrastination 
intervals for tasks scheduled by either the fixed priority or the dual priority scheduling 
policy. We show that dual priority scheduling always guarantees longe ... 

Keywords: critical speed, fixed priority, leakage power, low power scheduling, 
procrastication scheduling, real-time systems 



7 Profile-Based Dynamic Voltage Scheduling Using Program Checkpoints I 
A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, A. Nicolau 

March 2002 Proceedings of the conference on Design, automation and test in Europe 

Full text available: pdf(61 1.08 KB) 

Ji Additional Information: full citation , abstract 

ftll * Publisher Site 

Dynamic voltage scaling (DVS) is a known effectivemechanism for reducing CPU energy 
consumption withoutsignificant performance degradation. While a lot of workhas been done 
on inter-task scheduling algorithms to implementDVS under operating system control, new 
researchchallenges exist in intra-task DVS techniques under softwareand compiler control. 
In this paper we introduce anovel intra-task DVS technique under compiler control 
usingprogram checkpoints. Checkpoints are generated atcompile time ... 

8 Combined Dynamic Volta g e Scaling and Adaptive Body Biasing for Heterogeneous 
Distributed Real-time Embedded Systems 

Le Yan, Jiong Luo, Niraj K. Jha 

November 2003 Proceedings of the 2003 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(206.16 KB) Additional Information: full citation , abstract , citings , index terms 

Dynamic voltage scaling (DVS) is a powerful technique for reducingdynamic power 
consumption in a computing system. However, astechnology feature size continues to scale, 
leakage power is increasing and willlimit power savings obtained by DVS alone. Previous 
system-level real-timescheduling approaches use DVS alone to optimize power consumption 
withoutconsidering leakage power. To overcome this limitation, we propose a 
newscheduling algorithm that combines DVS and adaptive body biasing (ABB)to si ... 

9 A trace-based binary compilation framework for energy-aware computin g 
□an Li, Jingling Xue 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIGPLAN/SIGBED 

conference on Languages, compilers, and tools, volume 39 issue 7 
Full text available: pdf(441.18 KB) Additional Information: full citation , abstract , references , index terms 

Energy-aware compilers are becoming increasingly important for embedded systems due to 
the need to meet conflicting constraints on time, code size and power consumption. We 
introduce a trace-based, offline compiler framework on binaries and demonstrate its 
benefits in supporting energy optimisations. The key innovation lies in identifying frequently 
executed paths in a binary program and duplicating them as single-entry traces. Separating 
frequently from infrequently executed paths enables the c ... 

Keywords: binary translation, energy optimisation, link-time optimisation, profile-guided 
optimisation, trace 



10 System-level synthesis of low-power hard real-time systems 
Darko Kirovski, Miodrag Potkonjak 

June 1997 Proceedings of the 34th annual conference on Design automation - Volume 
00 
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Full text available: 1|j| pdf(1 38.77 KB) Additional Information: full citation , abstract , references , citings , index 

[Slf .„ . _ terms 
Publisher Site 

We present a system-level approach for power optimization undera set of user specified 
costs and timing constraints of hard real-timedesigns. The approach optimizes all three 
degrees of freedom forpower minimization, namely switching activity, effective capacityand 
voltage supply. We first define two key associated optimization problems, 
processorallocation and task assignment, and establish their computationalcomplexity. 
Efficient algorithms are developed for bothsystem design problems. The stat ... 

11 Design space exploration and scheduling for embedded softwa re: Leaka ge aware 
dynamic voltage scaling for real-time embedded systems 
Ravindra Jejurikar, Cristiano Pereira, Rajesh Gupta 

June 2004 Proceedings of the 41st annual conference on Design automation 

_ in . . . a ^/.nn C(1 Additional Information: full citation , abstract , references , citings , index 

Full text available: IBa pdf(1 09.61 KB) - 

terms 

A five-fold increase in leakage current is predicted with each technology generation. While 
Dynamic Voltage Scaling (DVS) is known to reduce dynamic power consumption, it also 
causes increased leakage energy drain by lengthening the interval over which a 
computation is carried out. Therefore, for minimization of the total energy, one needs to 
determine an operating point, called the critical speed . We compute processor slowdown 
factors based on the critical speed for energy minimization. ... 

Keywords: EDF scheduling, critical speed, leakage power, low power scheduling, 
procrastication, real-time systems 



12 Power optimization for real-time and media-rich embedded systems: Off-chip latency- Q 
driven dynamic voltage and frequency scalin g for an MPEG decoding 
Kihwan Choi, Ramakrishna Soma, Massoud Pedram 

June 2004 Proceedings of the 41st annual conference on Design automation 

Full text available: ^ pdf(365.55 KB) Additional Information: full citation , abstract , references , index terms 

This paper describes a dynamic voltage and frequency scaling (DVFS) technique for MPEG 
decoding to reduce the energy consumption using the computational workload 
decomposition. This technique decomposes the workload for decoding a frame into on-chip 
and off-chip workloads. The execution time required for the on-chip workload is CPU 
frequency-dependent, whereas the off-chip workload execution time does not change, 
regardless of the CPU frequency, resulting in the maximum energy savings by setting ... 

Keywords: MPEG decoding, low power, voltage and frequency scaling 



13 Profile-based adaptation for cache decay 
Karthik Sankaranarayanan, Kevin Skadron 

September 2004 ACM Transactions on Architecture and Code Optimization (TACO), 

Volume 1 Issue 3 

Additional Information: full citation , abstract , references , citings, index 



Full text available: P v „ 

^ terms 

"Cache decay" is a set of leakage-reduction mechanisms that put cache lines that have not 
been accessed for a specific duration into a low-leakage standby mode. This duration is 
called the decay interval, and its optimal value varies across applications. This paper 
describes an adaptation technique that analytically finds the optimal decay interval through 
profiling, and shows that the most important variables required for finding the optimal 
decay interval can be estimated with a reasonable deg ... 
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Keywords: Adaptation, cache decay, interval, leakage power 

14 Power modeling and optimization for embedded systems: Memory access scheduling Q 
and bindin g considering ener g y minimization in multi-bank memory systems 
Chun-Gi Lyuh, Taewhan Kim 

June 2004 Proceedings of the 41st annual conference on Design automation 

Full text available: ^ pdf(1 58.27 KB) Additional Information: full citation , abstract , references , index terms 

Memory-related activity is one of the major sources of energy consumption in embedded 
systems. Many types of memories used in embedded systems allow multiple operating 
modes (e.g., active, standby, nap, power-down) to facilitate energy saving. Furthermore, it 
has been known that the potential energy saving increases when the embedded systems 
use multiple memory banks in which their operating modes are controlled independently. In 
this paper, we propose (a compiler-directed) integrated approach t ... 

Keywords: binding, low energy design, scheduling 



15 Dynamic voltage scheduling with buffers in low-power multimedia applications U 
Chaeseok Im, Soonhoi Ha, Huiseok Kim 

November 2004 ACM Transactions on Embedded Computing Systems (TECS), volume 3 

Issue 4 

Full text available: || pdf(394.39 KB) Additional Information: full citation , abstract , references , index terms 

Power-efficient design of multimedia applications becomes more important as they are used 
increasingly in many embedded systems. We propose a simple dynamic voltage scheduling 
(DVS) technique, which suits multimedia applications well and, in case of soft real-time 
applications, allows all idle intervals of the processor to be fully exploited by using buffers. 
Our main theme is to determine the minimum buffer size to maximize energy saving in 
three cases: (i) single task, (ii) multiple subtask ... 

Keywords: Buffer requirement estimation, dynamic voltage scheduling, low-power 
systems, multimedia applications 

16 Reconfigurable system: Energy/power estimation of regular processor arrays Q 
Steven Derrien, Sanjay Rajopadhye 

October 2002 Proceedings of the 15th international symposium on System Synthesis 

Full text available: ||) pdf(909.01 KB) Additional Information: full citation , abstract , references , index terms 

We propose a high-level analytical model for estimating the energy and/or power 
dissipation in VLSI processor (systolic) array implementations of loop programs, particularly 
for implementations on FPGA based CO-processors. We focus on the respective impact of 
the array design parameters on the overall off-chip i/o traffic and the number and sizes of 
the local memories in the array. The model is validated experimentally and shows good 
results (12.7% RMS error in the predictions). 

Keywords: design space exploration, power estimation, processor array partitioning, 
programmable logic 

1 7 Tuning garbage collection for reducin g memory system energy in an embedded java Q 
environment 

G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, M. Wolczko 
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November 2002 ACM Transactions on Embedded Computing Systems (TECS), volume l 
Issue 1 

Full text available: |5) pdf(740.23 KB ) Additional Information: full citation , abstract , references , index terms 

Java has been widely adopted as one of the software platforms for the seamless integration 
of diverse computing devices. Over the last year, there has been great momentum in 
adopting Java technology in devices such as cellphones, PDAs, and pagers where optimizing 
energy consumption is critical. Since, traditionally, the Java virtual machine (JVM), the 
cornerstone of Java technology, is tuned for performance, taking into account energy 
consumption requires reevaluation, and possibly redesign oft ... 

Keywords: Garbage collector, Java Virtual Machine (JVM), K Virtual Machine (KVM), low 
power computing 



1 8 Ex ploiting Resonant Behavior to Reduce Inductive Noise 

March 2004 ACM SIGARCH Computer Architecture News , Proceedings of the 31st 

annual international symposium on Computer architecture, volume 32 issue 2 
Full text available: Spdf(1 90.42 KB) Additional Information: full citation , abstract 



Inductive noise in high-performance microprocessors is a reliabilityissue caused by 
variations in processor current (di/dt)which are converted to supply-voltage glitches by 
impedances inthe power-supply network. Inductive noise has been addressed byusing 
decoupling capacitors to maintain low impedance in thepower supply over a wide range of 
frequencies. However, evenwell-designed power supplies exhibit (a few) peaks of high 
impedanceat resonant frequencies caused by RLC resonant loops. Previousa ... 

19 Superscalar architectures: Reducing the complexity of the register file in dynamic | 
su perscalar processors 

Rajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available. ^ ^ 134 MB ]j|p Add j t j ona | information: full citation , abstract , references , citings 
Publisher Site 

Dynamic superscalar processors execute multiple instructions out-of-order by looking for 
independent operations within a large window. The number of physical registers within the 
processor has a direct impact on the size of this window as most in-flight instructions 
require a new physical register at dispatch. A large multi-ported register file helps improve 
the instruction-level parallelism (ILP), but may have a detrimental effect on clock speed, 
especially in future wire-limited technologies. ... 

20 O ptimization: Transistor sizin g of ener g y-delay-efficient circuits | 
Paul I. Penzes, Mika Nystrom, Alain J. Martin 

December 2002 Proceedings of the 8th ACM/IEEE international workshop on Timing 
issues in the specification and synthesis of digital systems 

Full text available: |p pdf(207.24 KB) Additional Information: full citation , abstract , references , index terms 

This paper studies the problem of transistor sizing of CMOS circuits optimized for energy- 
delay efficiency, i.e., for optimal EtP where E is the energy consumption and t is the delay of 
the circuit, while n is a fixed positive optimization index that reflects the chosen trade-off 
between energy and delay.We propose a set of analytical formulas that closely approximate 
the optimal transistor sizes. We then study an efficient iteration procedure that can furt ... 

Keywords: energy-delay optimization, transistor sizing 
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21 Energy-o riented compiler optimizations for partit i oned memory architectures 
V. Delaluz, M. Kandemir, N. Vijaykrishnan, M. J. Irwin 

November 2000 Proceedings of the 2000 international conference on Compilers, 
architectures, and synthesis for embedded systems 

Full text available: pdf(749.73 KB) Additional Information: full citation , citings 



22 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor 
Power Reduction 

Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, Dean M. 
Tullsen 

December 2003 Proceedings of the 36th Annual IEEE/ACM International Symposium on 
Microarchitecture 

pdf(295.12 KB) 

^Additional Information: full citation , abstract , index terms 
Publisher Site 

This paper proposes and evaluates single-ISA heterogeneousmulti-core architectures as a 
mechanism to reduceprocessor power dissipation. Our design incorporatesheterogeneous 
cores representing different points inthe power/performance design space; during an 
application'sexecution, system software dynamically chooses themost appropriate core to 
meet specific performance andpower requirements.Our evaluation of this architecture 
shows significant energybenefits. For an objective function that optimi ... 

2 3 Scheduling techniques for embedded systems: Scheduler-based DRAM ener g y 
management 

V. Delaluz, A. Sivasubramaniam, M. Kandemir, N. Vijaykrishnan, M. J. Irwin 
June 2002 Proceedings of the 39th conference on Design automation 

i ui a ma-,i <in izd\ Additional Information: full citation , abstract , references , citings , index 

Full text available: TOpdf(172.10 KB) - 

" sr terms 

Previous work on DRAM power-mode management focused on hardware-based techniques 
and compiler-directed schemes to explicitly transition unused memory modules to low- 
power operating modes. While hardware-based techniques require extra logic to keep track 
of memory references and make decisions about future mode transitions, compiler-directed 
schemes can only work on a single application at a time and demand sophisticated program 
analysis support. In this work, we present an operating system (OS) ... 
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Keywords: DRAM, energy estimation, energy management, operating systems, scheduler 



24 An optimal memory allocation scheme for scratch-pad-based embedded systems 
Oren Avissar, Rajeev Barua, Dave Stewart 

November 2002 ACM Transactions on Embedded Computing Systems (TECS), volume l 
Issue 1 

Additional Information: fuli citation , abstract , references , citings, index 



Full text available:' r ^ _ - 

^ terms 

This article presents a technique for the efficient compiler management of software-exposed 
heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers 
and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal 
DRAM, external DRAM, and ROM are visible directly to the software, without automatic 
management by a hardware caching mechanism. Instead, the memory units are mapped to 
different portions of the address space. Caches are avoided due to the ... 

Keywords: Memory, allocation, embedded, heterogeneous, storage 



25 Caches and Memory Systems: Heterogeneous memory management for embedded 
s ystems 

Oren Avissar, Rajeev Barua, Dave Stewart 

November 2001 Proceedings of the 2001 international conference on Compilers, 

architecture, and synthesis for embedded systems 

t- M * ^ ui 0 ^ i^ D \ Additional Information: full citation, abstract, references, citings, index 

Full text available: Tm] pdf (241 .12 KB) ~~~ 

terms 

This paper presents a technique for the efficient compiler management of software-exposed 
heterogeneous memory. In many lower-end embedded chips, often used in micro- 
controllers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, 
internal DRAM, external DRAM and ROM are visible directly to the software, without 
automatic management by a hardware caching mechanism. Instead the memory units are 
mapped to different portions of the address space. Caches are avoided because of th ... 

Keywords: embedded, heterogeneous, memory, storage 



26 An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches 
Changkyu Kim, Doug Burger, Stephen W. Keckler 

October 2002 Proceedings of the 10th international conference on Architectural 

support for programming languages and operating systems, volume 37 , 30 , 

36 Issue 10 , 5 , 5 

Full text available: ^| pcff(1.33 MB) Additional Information: full citation , abstract , references , citings 

Growing wire delays will force substantive changes in the designs of large caches. 
Traditional cache architectures assume that each level in the cache hierarchy has a single, 
uniform access time. Increases in on-chip communication delays will make the hit time of 
large on-chip caches a function of a line's physical location within the cache. Consequently, 
cache access times will become a continuum of latencies rather than a single discrete 
latency. This non-uniformity can be exploited to provide ... 

27 A Geometric Approach to Multi-Criterion Reinforcement Learning 
Shie Mannor, Nahum Shimkin 

August 2004 The Journal of Machine Learning Research, volume 5 
Full text available: ^ pdf(459.68 KB) Additional Information: full citation , abstract , index terms 
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We consider the problem of reinforcement learning in a controlled Markov environment with 
multiple objective functions of the long-term average reward type. The environment is 
initially unknown, and furthermore may be affected by the actions of other agents, actions 
that are observed but cannot be predicted beforehand. We capture this situation using a 
stochastic game model, where the learning agent is facing an adversary whose policy is 
arbitrary and unknown, and where the reward function is ve ... 

28 Memory optimization: A post-compiler approach to scratchpad mapping of code 
Federico Angiolini, Francesco Menichelli, Alberto Ferrero, Luca Benini, Mauro Ollvieri 
September 2004 Proceedings of the 2004 international conference on Compilers, 
architecture, and synthesis for embedded systems 

Full text available: pdf(1 54.30 KB) Additional Information: full citation , abstract , references , index terms 

Scratchpad Memories (SPMs) are commonly used in embedded systems because they are 
more energy-efficient than caches and enable tighter application control on the memory 
hierarchy. Optimally mapping code and data to SPMs is, however, still a challenge. This 
paper proposes an optimal scratchpad mapping approach for code segments, which has the 
distinctive characteristic of working directly on application binaries, thus requiring no access 
to either the compiler or the application source code - a c ... 

Keywords: design automation, dynamic programming, embedded design, executable 
patching, memory hierarchy, optimization algorithm, post-compiler processing, power 
saving, scratchpad memory 




29 Issues in partitioning & design space eploration for codesiqn: Automatic application- 
s pecific instruction-set extensions under microarchitectural constraints 
Kubilay Atasu, Laura Pozzi, Paolo Ienne 

June 2003 Proceedings of the 40th conference on Design automation 

v II i , . . 0 , wccc nn „ m Additional Information: full citation , abstract , references , citings , index 
Full text available: ]|| pdf(656,99 KB) terms 

Many commercial processors now offer the possibility of extending their instruction set for a 
specific application— that is, to introduce customised functional units. There is a need to 
develop algorithms that decide automatically, from high-level application code, which 
operations are to be carried out in the customised extensions. A few algorithms exist but 
are severely limited in the type of operation clusters they can choose and hence reduce 
significantly the effectiveness of specialisation ... 

Keywords: ASIPs, codesign, customisable processors, instruction-set extensions 



30 A low-power in-order/out-of-order issue queue 
Yu Bai, R. Iris Bahar 

June 2004 ACM Transactions on Architecture and Code Optimization (TACO), volume l 
Issue 2 

Full text available: ^ pdf(832.73 KB ) Additional Information: full citation , abstract , references , index terms 

To better address power concerns, a good design strategy should be flexible enough to 
dynamically reconfigure available resources according to the application's needs such that 
extra power is dissipated only when it is really needed. In this work, we focus on power- 
aware solutions for the issue queue (IQ) in an out-of-order superscalar processor. We . 
propose two schemes that partition the IQ into FIFOs such that only the instructions at the 
head of each FIFO may request to issue. We then monitor ... 

Keywords: High-performance, instruction issue logic, low power 
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31 Session 9B: Power issues in high level synthesis: Transient power management Q 
through high level synthesis 

Vijay Raghunathan, Srivaths Ravi, Anand Raghunathan, Ganesh Lakshminarayana 

November 2001 Proceedings of the 2001 IEEE/ACM international conference on 

Computer-aided design 

_ |. ., i . « , woo , AA 1/D \ Additional Information: full citation , abstract , references, citings , index 

Full text available: TO pdf(221.14 KB ) *~ 

^ terms 

The use of nanometer technologies is making it increasingly important to consider transient 
characteristics of a circuit's power dissipation (e.g., peak power, and power gradient or 
differential) in addition to its average power consumption. Current transient power analysis 
and reduction approaches are mostly at the transistor- and logic-levels. We argue that, as 
was the case with average power minimization, architectural solutions to transient power 
problems can complement and significan ... 

32 The NYU Ultracomputer— designing a MIMD. shared-memory parallel machine Q 
(Extended Abstract) 

Allan Gottlieb, Ralph Grishman, Clyde P. Kruskal, Kevin P. McAuliffe, Larry Rudolph, Marc Snir 
April 1982 Proceedings of the 9th annual symposium on Computer Architecture 

.— || , f ., u. « Mi a Additional Information: full citation , abstract , references , citings , index 

Full text available: ITj pdf(1.36 MB) 

D terms 

We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine 
composed of thousands of autonomous processing elements. This machine uses an 
enhanced message switching network with the geometry of an Omega-network to 
approximate the ideal behavior of Schwartz's paracomputer model of computation and to 
implement efficiently the important fetch-and-add synchronization primitive. We outline the 
hardware that would be required to build a 4096 processor system using 1990' ... 

33 Usin g an oracle to measure potential parallelism in sin g le instruction stream programs Q 
Alexandru Nicolau, Joseph A. Fisher 

December 1981 Proceedings of the 14th annual workshop on Microprogramming 

„ 4 .. ., 0 m ,oao n C i/dv Additional Information: full citation , abstract , references , citings, index 

Full text available: TO pdf(848.06 KB) * 

^ terms 

Horizontally microprogrammable CPUs belong to a class of machines having statically 
schedulable parallel instruction execution (SPIE machines). Several experiments have 
shown that within basic blocks, real code only gives a potential speed-up factor of 2 or 3 
when compacted for SPIE machines, even in the presence of unlimited hardware. In this 
paper, similar experiments are described. However, these measure the potential parallelism 
available using any global compaction method, that is, one ... 

34 Hardware-managed re g ister allocation for embedded processors ijj 
Xiaotong Zhuang, Tao Zhang, Santosh Pande 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIGPLAN/SIGBED 

conference on Languages, compilers, and tools, volume 39 issue 7 
Full text available: fi3 Pdf(265.98 KB) Additional Information: full citation , abstract , references , index terms 



Most modern processors (either embedded or general purpose) contain higher number of 
physical registers than those exposed in the ISA. Due to a variety of reasons, this 
phenomenon is likely to continue especially on embedded systems where encoding space is 
very limited. Saving the encoding space leads to lower power consumption in the I-cache; 
on the other hand, harnessing more physical registers saves power in the memory 
subsystem and reduces latency as well. These design decisions however resu ... 

Keywords: architected registers, embedded systems, physical registers, power 
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35 Survey of code-size reduction methods 

Arpad Beszedes, Rudolf Ferenc, Tibor Gyimothy, Andre Dolenc, Konsta Karsisto 
September 2003 ACM Computing Surveys (CSUR), volume 35 issue 3 

Full text available:^ pdf(443.89 KB) Additional Information: full citation , abstract , references , index terms 

Program code compression is an emerging research activity that is having an impact in 
several production areas such as networking and embedded systems. This is because the 
reduced-sized code can have a positive impact on network traffic and embedded system 
costs such as memory requirements and power consumption. Although code-size reduction 
is a relatively new research area, numerous publications already exist on it. The methods 
published usually have different motivations and a variety of appli ... 

Keywords: code compaction, code compression, method assessment, method evaluation 



36 S peculative software mana g ement of datapath-width for energy optimization 
Gilles Pokam, Olivier Rochecouste, Andre Seznec, Frangois Bodin 

June 2004 ACM SIGPLAN Notices , Proceedings of the 2004 ACM SIGPLAN/SIGBED 

conference on Languages, compilers, and tools, volume 39 issue 7 
Full text available: ^ pdf(609.97 KB) Additional Information: full citation , abstract , references , index terms 

This paper evaluates managing the processor's datapath-width at the compiler level by 
means of exploiting dynamic narrow-width operands. We capitalize on the large occurrence 
of these operands in multimedia programs to build static narrow-width regions that may be 
directly exposed to the compiler. We propose to augment the ISA with instructions directly 
exposing the datapath and the register widths to the compiler. Simple exception 
management allows this exposition to be only speculative. In thi ... 

Keywords: clock-gating, compiler, energy management, narrow- width regions, 
reconfigurable computing, speculative execution 

37 Method-level phase behavior in java workloads Q 
Andy Georges, Dries Buytaert, Lieven Eeckhout, Koen De Bosschere 

October 2004 ACM SIGPLAN Notices , Proceedings of the 19th annual ACM SIGPLAN 
Conference on Object-oriented programming, systems, languages, and 

applications, Volume 39 Issue 10 
Full text available: ^ pdf(695.63 KB) Additional Information: full citation , abstract , references , index terms 

Java workloads are becoming more and more prominent on various computing devices. 
Understanding the behavior of a Java workload which includes the interaction between the 
application and the virtual machine (VM), is thus of primary importance during performance 
analysis and optimization. Moreover, as contemporary software projects are increasing in 
complexity, automatic performance analysis techniques are indispensable. This paper 
proposes an off-line method-level phase analysis approach for ... 

38 Processors: Predictable performance in SMT processors B 
Francisco J. Cazorla, Peter M.W. Knijnenburg, Rizos Sakellariou, Enrique Fernandez, Alex 
Ramirez, Mateo Valero 

April 2004 Proceedings of the first conference on computing frontiers on Computing 
frontiers 

Full text available: ^ pdf(307.27 KB) Additional Information: full citation , abstract , references , index terms 
Current instruction fetch policies in SMT processors are oriented towards optimization of 
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overall throughput and/or fairness. However, they provide no control over how individual 
threads are executed, leading to performance unpredictability, since the IPC of a thread 
depends on the workload it is executed in and on the fetch policy used. From the point of 
view of the Operating System (OS), it is the job scheduler that determines how jobs are 
executed. However, when the OS runs on an SMT processor ... 

Keywords: ILP, SMT, multithreading, operating systems, performance predictability, real 
time, thread-level parallelism 



39 Low-energy for deep-submicron address buses 
Luca Macchiarulo, Enrico Macii, Massimo Poncino 

August 2001 Proceedings of the 2001 international symposium on Low power 
electronics and design 

Full text available: fSl pdf(314.03 KB) Additional Information: full citation , references , citings , index terms 



40 Pen computing: a technology overview and a vision 
Andre Meyer 

July 1995 ACM SIGCHI Bulletin, volume 27 issue 3 

Full text available: ^j| pdf(5. 14 MB) Additional Information: full citation , abstract , citings , index terms 

This work gives an overview of a new technology that is attracting growing interest in public 
as well as in the computer industry itself. The visible difference from other technologies is in 
the use of a pen or pencil as the primary means of interaction between a user and a 
machine, picking up the familiar pen and paper interface metaphor. From this follows a set 
of consequences that will be analyzed and put into context with other emerging 
technologies and visions. Starting with a short historic ... 
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41 Methods for true power minimization Q 
Robert W. Brodersen, Mark A. Horowitz, Dejan Markovic, Borivoje Nikolic, Vladimir Stojanovic 
November 2002 Proceedings of the 2002 IEEE/ACM international conference on 

Computer-aided design 

i_ I. < ., a ., /CAO 00 ^ D v Additional Information: full citation , abstract , references , citings , index 
Full text available: TO pdf(50o.33 KB) ; 
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This paper presents methods for efficient power minimization at circuit and micro- 
architectural levels. The potential energy savings are strongly related to the energy profile 
of a circuit. These savings are obtained by using gate sizing, supply voltage, and threshold 
voltage optimization, to minimize energy consumption subject to a delay constraint. The 
true power minimization is achieved when the energy reduction potentials of all tuning 
variables are balanced. We derive the sensitivity of ene ... 

42 Ener gy-aware compiling and sc hedul in g : CPU schedulin g for statist i cally-assured real- Q 
time performance and improved energy efficiency 

Haisang Wu, Binoy Ravindran, E. Douglas Jensen, Peng Li 

September 2004 Proceedings of the 2nd IEEE/ACM/IFIP international conference on 
Hardware/software codesign and system synthesis 

Full text available: ^ pdf(171.24 KB) Additional Information: full citation , abstract , references, index terms 

We present a CPU scheduling algorithm, called Energy-efficient Utility Accrual Algorithm (or 
EUA), for battery-powered, embedded real-time systems. We consider an embedded 
software application model where repeatedly occurring application activities are subject to 
deadline constraints specified using step time/utility functions. For battery-powered 
embedded systems, system-level energy consumption is also a primary concern. We 
consider CPU scheduling that (1) provides assurances on individ ... 

Keywords: energy-efficient scheduling, real-time systems, time/utility functions, utility 
accrual scheduling 



43 The importance of being square Q 
Clyde P. Kruskal, Marc Snir 

January 1984 ACM SIGARCH Computer Architecture News , Proceedings of the 11th 

annual international symposium on Computer architecture, volume 12 issue 3 
Full text available: ^ pdf(662.13 KB) Additional Information: full citation , abstract , references , index terms 
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We present a theory that defines performance of packet-switching interconnection networks 
(delay and capacity) and their cost in terms of their geometry. This is used to prove that 
square banyan networks have optimal performance/cost ratio. These results, together with 
some known results on the complexity of routing in multistage networks, show that 
multistage shuffle-exchange networks are the unique networks with both optimal 
performance and simple routing. Finally, square delta networks a ... 

44 A profile-based energy-efficient intra-task voltage schedulin g al g orithm for real-time Q 
applications 

Dongkun Shin, Jihong Kim 

August 2001 Proceedings of the 2001 international symposium on Low power 
electronics and design 

Full text available: fQ pdf(82.57 KB) Additional Information: full citation , references , citings , index terms 



45 Cycle-accurate simulation of energy consumption in embedded systems 
Tajana Simunic, Luca Benini, Giovanni De Micheli 

June 1999 Proceedings of the 36th ACM/IEEE conference on Design automation 

Full text available: ^ pdf(777.35 KB) Additional Information: full citation , references , citings , index terms 



46 Pi peline damping: a microarchitectural technique to reduce inductive noise in supply Q 
voltage 

Michael D. Powell, T. N. Vijaykumar 

May 2003 ACM SIGARCH Computer Architecture News , Proceedings of the 30th 

annual international symposium on Computer architecture, volume 31 issue 2 
Full text available: ^ pdf(272.13 KB) Additional Information: full citation , abstract , references , citings 

Scaling of CMOS technology causes the power supply voltages to fall and supply currents to 
rise at the same time as operating speeds are increasing. Falling supply voltages cause 
noise margins to decrease, while increasing current and frequency makes supply noise 
injection larger, especially noise caused by inductance in the supply lines. Creating power 
distribution systems is one of the key challenges in modern chip design. Decoupling 
capacitance helps reduce inductance effects, but there is of ... 

47 Schedulin g and resource allocation: Ener g y-efficient soft real-time CPU scheduling for Q 
mobile multimedia systems 

Wanghong Yuan, Klara Nahrstedt 

October 2003 Proceedings of the nineteenth ACM symposium on Operating systems 
principles 

Full text available: ^ pdf(51 1 .80 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents GRACE-OS, an energy-efficient soft real-time CPU scheduler for mobile 
devices that primarily run multimedia applications. The major goal of GRACE-OS is to 
support application quality of service and save energy. To achieve this goal, GRACE-OS 
integrates dynamic voltage scaling into soft real-time scheduling and decides how fast to 
execute applications in addition to when and how long to execute them. GRACE-OS makes 
such scheduling decisions based on the probability dist ... 

Keywords: mobile computing, multimedia, power management 
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June 2002 ACM SIGPLAN Notices , Proceedings of the joint conference on Languages, 
compilers and tools for embedded systems: software and compilers for 
embedded systems, volume 37 issue 7 

Full text available:f9 pdf(384.92 KB) Additional lnformation: fa» citation , references, dtings, index 

^ terms 

As energy consumption has become a majorconstraint in current system design, it is 
essential to look beyond the traditional low-power circuit and architectural optimizations. 
Further, software is becoming an increasing portion of embedded/portable systems. 
Consequently, optimizing the software in conjunction with the underlying low-power 
hardware features such as voltage scaling is vital. In this paper, we present two compiler- 
directed energy optimization strategies based on voltage scaling: stat ... 

Keywords: energy-aware compilation, loop transformations, optimizing compilers, voltage 
scaling 

49 DB-IR-1 (databases and information retrieval): indexin g and q uer y processing effiencv: Q 
Energy management schemes for memory-resident database systems 
Jayaprakash Pisharath, Alok Choudhary, Mahmut Kandemir 

November 2004 Proceedings of the Thirteenth ACM conference on Information and 
knowledge management 

Full text available: ^ pdf(251.47 KB) Additional Information: full citation , abstract , references , index terms 

With the tremendous growth of system memories, memory-resident databases are 
increasingly becoming important in various domains. Newer memories provide a structured 
way of storing data in multiple chips, with each chip having a bank of memory modules. 
Current memory-resident databases are yet to take full advantage of the banked storage 
system, which offers a lot of room for performance and energy optimizations. In this paper, 
we identify the implications of a banked memory environment in sup ... 

Keywords: DRAM, database, energy, hardware energy scheme, multiquery optimization, 
power consumption, query-directed energy management 



50 Low power SOCs and NOCs: Disk drive energ y o ptimization for audio-video . Q 

appl ications 

Ravishankar Rao, Sarma Vrudhula, Musaravakkam S. Krishnan 

September 2004 Proceedings of the 2004 international conference on Compilers, 
architecture, and synthesis for embedded systems 

Full text available: Q pdf(653.24 KB) Additional Information: full citation , abstract , references , index terms 

Earlier techniques for low power speed control in disk drives running audio/video 
applications attempted to either match the drive's speed to the data rate requirement of the 
host application (just-in-time speed), or run it at the maximum drive speed, neither of 
which are energy-optimal in general. Starting from the theory of DC motors, we obtain a 
high-level power model of a disk drive. We then analytically obtain the speed profile 
(function of time) that minimizes the energy required to transf ... 

Keywords: disk drive, low power, multimedia, speed control 



51 Dynamic compilation for ener g y adaptation Q 
P. Unnikrishnan, G. Chen, M. Kandemir, D. R. Mudgett 

November 2002 Proceedings of the 2002 IEEE/ACM international conference on 
Computer-aided design 

Full text available: pdf(1 56.76 KB) Additional Information: full citation , abstract , references , index terms 
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While previous compiler research indicates that significant improvements in energy 
efficiency may be possible if properly optimized code is used, the energy constraints under 
which a given application code should be optimized may not always be available at compile- 
time. More importantly, these constraints may change dynamically during the course of 
execution. In this work, we present a dynamic recompilation/linking framework using which 
the energy behavior of a given application can be optimized ... 

52 Power and ener g y reduction via pipeline balancin g 
R. Iris Bahar, Srilatha Manne 

May 2001 ACM SIGARCH Computer Architecture News , Proceedings of the 28th 

annual international symposium on Computer architecture, volume 29 issue 2 
Full text available* odf(106MBl Additional Information: full citation , abstract , references , citings , index ' 
" 1=1 terms 

Minimizing power dissipation is an important design requirement for both portable and non- 
portable systems. In this work, we propose an architectural solution to the power problem 
that retains performance while reducing power. The technique, known as Pipeline Balancing 
(PLB), dynamically tunes the resources of a general purpose processor to the needs of the 
program by monitoring performance within each program. We analyze metrics for triggering 
PLB, and detail instruction que ... 

53 Let caches decay: reducing leakage ener g y via exploitation of cache generational 
behavior 

Zhigang Hu, Stefanos Kaxiras, Margaret Martonosi 

May 2002 ACM Transactions on Computer Systems (TOCS), volume 20 issue 2 

c .. * ^ 1 ui a ^/Q7Q no isn\ Additional Information: full citation , abstract , references , citings , index 

Full text available: T fl pdf(873.03 KB) - 
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Power dissipation is increasingly important in CPUs ranging from those intended for mobile 
use, all the way up to high-performance processors for highend servers. Although the bulk 
of the power dissipated is dynamic switching power, leakage power is also beginning to be a 
concern. Chipmakers expect that in future chip generations, leakage's proportion of total 
chip power will increase significantly. This article examines methods for reducing leakage 
power within the cache memories of the CPU. Be ... 

Keywords: Cache memories, cache decay, generational behavior, leakage power 



54 Low-energy intra-task voltage scheduling using static timing analysis j 
Dongkun Shin, Jihong Kim, Seongsoo Lee 

June 2001 Proceedings of the 38th conference on Design automation 

Full text available: fiQ pdfd 13.20 KB) Additional lnformation: fu " citation * ^stract, references, citings, index 
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We propose an intra-task voltage scheduling algorithm for low-energy hard real-time 
applications. Based on a static timing analysis technique, the proposed algorithm controls 
the supply voltage within an individual task boundary. By fully exploiting all the slack times, 
a scheduled program by the proposed algorithm always complete its execution near the 
deadline, thus achieving a high energy reduction ratio. In order to validate the effectiveness 
of the proposed algorithm, we built a softwa ... 

55 Memory Ordering: A Value-Based Approach j 
March 2004 ACM SIGARCH Computer Architecture News , Proceedings of the 31st 

annual international symposium on Computer architecture, volume 32 issue 2 
Full text available: Q pdf(244.36 KB) Additional Information: full citation , abstract 

Conventional out-of-order processors employ a multi-ported,fully-associative load queue to 



http://portal.acm.org/res^^ Vfi/Oi 



Results (page 3): ("maximal energy 11 or "maximum energy") and "per cycle" and time and cycle and cons... Page 5 of 6 



guarantee correctmemory reference order both within a single thread of executionand 
across threads in a multiprocessor system. Asimprovements in process technology and 
pipelining lead tohigher clock frequencies, scaling this complex structure toaccommodate a 
larger number of in-flight loads becomesdifficult if not impossible. Furthermore, each access 
to thiscomplex structure consumes excessive amounts of e ... 

56 Communication-Aware Task Scheduling and Voltage Selection for Total Systems | 
Energy Minimization 
Girish Varatkar, Radu Marculescu 

November 2003 Proceedings of the 2003 IEEE/ACM international conference on 
Computer-aided design 

Full text available: |j|pdf( 1 90.05 KB ) Additional Information: full citation , abstract , index terms 

In this paper, we present an interprocessor communication-aware task scheduling algorithm 
applicable to a multiprocessor system executing an application with dependent tasks. Our 
algorithm takes the application task graph and the architecture graph as inputs,assigns the 
tasks to processors and then schedules them. As main theoreticalcontribution, the 
algorithm we propose reduces the overallsystems energy by (i) reducing the total 
interprocessor communicationand (ii) executing certain cycles at a ... 

Keywords: low-power scheduling, dynamic voltage scaling 



57 Energy-performance trade-offs for spatial access methods on memory-resident data Q 
Ning An, Sudhanva Gurumurthi, Anand Sivasubramaniam, Narayanan Vijaykrishnan, Mahmut 
Kandemir, Mary Jane Irwin 

November 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 11 Issue 3 
Full text available: pdf(641.55 KB) Additional Information: full citation , abstract , index terms 

The proliferation of mobile and pervasive computing devices has brought energy constraints 
into the limelight. Energy-conscious design is important at all levels of system architecture, 
and the software has a key role to play in conserving battery energy on these devices. With 
the increasing popularity of spatial database applications, and their anticipated deployment 
on mobile devices (such as road atlases and GPS-based applications), it is critical to 
examine the energy implications of spatial ... 

Keywords: Energy optimization, Multidimensional indexing, Resource-constrained 
computing, Spatial data 



58 C ycle time and slack optimization for VLSI-chips 
C. Albrecht, B. Korte, J. Schietke, J. Vygen 

November 1999 Proceedings of the 1999 IEEE/ACM international conference on 
Computer-aided design 

Additional Information: full citation , abstract , references , citings , index 
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We consider the problem of finding an optimal clock schedule, i.e. optimal arrival times for 
clock signals at latches of a VLSI chip. We describe a general model which includes all 
previously considered models. Then we show how to optimize the cycle time and optimally 
balance slacks on data paths and on clocktree paths.The problem of finding a clock 
schedule with the optimum cycle time was solved before, either by linear programming or 
by binary search, using a test for negative ... 
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methodologies and research networking: Fast, predictable and low energy memory 
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references through arch itecture-a ware compilation 
Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefan Steinke, Urs Helmig 
January 2004 Proceedings of the 2004 conference on Asia South Pacific design 
automation: electronic design and solution fair 2004 

Full text available: pdf(233.00 KB) Additional Information: full citation , abstract , references 

The design of future high-performance embedded systems is hampered by two problems: 
First, the required hardware needs more energy than is available from batteries. Second, 
current cache-based approaches for bridging the increasing speed gap between processors 
and memories cannot guarantee predictable real-time behavior. A contribution to solving 
both problems is made in this paper which describes a comprehensive set of algorithms that 
can be applied at design time in order to maximally exploit ... 

60 Hard real-time schedulin g for low-ener a v usin g stochastic data and DVS processors Q 
Flavius Gruian 

August 2001 Proceedings of the 2001 international symposium on Low power 
electronics and design 

Full text available: ^£ )pdf(1 66.48 KB) Additional Information: full citation , references , citings , index terms 
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